10 research outputs found
Basins of Attraction, Commitment Sets and Phenotypes of Boolean Networks
The attractors of Boolean networks and their basins have been shown to be
highly relevant for model validation and predictive modelling, e.g., in systems
biology. Yet there are currently very few tools available that are able to
compute and visualise not only attractors but also their basins. In the realm
of asynchronous, non-deterministic modeling not only is the repertoire of
software even more limited, but also the formal notions for basins of
attraction are often lacking. In this setting, the difficulty both for theory
and computation arises from the fact that states may be ele- ments of several
distinct basins. In this paper we address this topic by partitioning the state
space into sets that are committed to the same attractors. These commitment
sets can easily be generalised to sets that are equivalent w.r.t. the long-term
behaviours of pre-selected nodes which leads us to the notions of markers and
phenotypes which we illustrate in a case study on bladder tumorigenesis. For
every concept we propose equivalent CTL model checking queries and an extension
of the state of the art model checking software NuSMV is made available that is
capa- ble of computing the respective sets. All notions are fully integrated as
three new modules in our Python package PyBoolNet, including functions for
visualising the basins, commitment sets and phenotypes as quotient graphs and
pie charts
Approximating attractors of Boolean networks by iterative CTL model checking
This paper introduces the notion of approximating asynchronous attractors of
Boolean networks by minimal trap spaces. We define three criteria for
determining the quality of an approximation: “faithfulness” which requires
that the oscillating variables of all attractors in a trap space correspond to
their dimensions, “univocality” which requires that there is a unique
attractor in each trap space, and “completeness” which requires that there are
no attractors outside of a given set of trap spaces. Each is a reachability
property for which we give equivalent model checking queries. Whereas
faithfulness and univocality can be decided by model checking the
corresponding subnetworks, the naive query for completeness must be evaluated
on the full state space. Our main result is an alternative approach which is
based on the iterative refinement of an initially poor approximation. The
algorithm detects so-called autonomous sets in the interaction graph,
variables that contain all their regulators, and considers their intersection
and extension in order to perform model checking on the smallest possible
state spaces. A benchmark, in which we apply the algorithm to 18 published
Boolean networks, is given. In each case, the minimal trap spaces are
faithful, univocal, and complete, which suggests that they are in general good
approximations for the asymptotics of Boolean networks
Designing miRNA-Based Synthetic Cell Classifier Circuits Using Answer Set Programming
Cell classifier circuits are synthetic biological circuits capable of distinguishing between different cell states depending on specific cellular markers and engendering a state-specific response. An example are classifiers for cancer cells that recognize whether a cell is healthy or diseased based on its miRNA fingerprint and trigger cell apoptosis in the latter case. Binarization of continuous miRNA expression levels allows to formalize a classifier as a Boolean function whose output codes for the cell condition. In this framework, the classifier design problem consists of finding a Boolean function capable of reproducing correct labelings of miRNA profiles. The specifications of such a function can then be used as a blueprint for constructing a corresponding circuit in the lab. To find an optimal classifier both in terms of performance and reliability, however, accuracy, design simplicity and constraints derived from availability of molcular building blocks for the classifiers all need to be taken into account. These complexities translate to computational difficulties, so currently available methods explore only part of the design space and consequently are only capable of calculating locally optimal designs. We present a computational approach for finding globally optimal classifier circuits based on binarized miRNA datasets using Answer Set Programming for efficient scanning of the entire search space. Additionally, the method is capable of computing all optimal solutions, allowing for comparison between optimal classifier designs and identification of key features. Several case studies illustrate the applicability of the approach and highlight the quality of results in comparison with a state of the art method. The method is fully implemented and a comprehensive performance analysis demonstrates its reliability and scalability
Beiträge zur Analyse von Qualitativen Modellen Genregulatorischer Netzwerke
This thesis addresses three challenges in modeling regulatory and signal
transduction networks. Starting point is the generalized logical formalism as
introduced by R. Thomas and further developed by D. Thieffry, E. H. Snoussi
and M. Kaufman. We introduce the fundamental concepts that make up such
models, the interaction graph and the state transition graph, as well as model
checking, a computer science technique for deciding whether a finite
transition system satisfies a given temporal specification. The first problem
we turn to is that of whether a given model is consistent with time series
data. To do so we introduce query patterns that can be automatically derived
from discretized data. Time series data, being such an abundant source of
information for reverse engineering, has previously been used in the context
of logical models but only under the synchronous, transition-based notion of
consistency. The arguably more realistic asynchronous transition relation has
so far been excluded from such data driven reverse engineering, probably
because the corresponding non-determinism in the transition system introduces
additional obstacles to the already hard problem. Our contribution here is a
path-based notion of consistency between model and data that works for any
transition relation. In particular, we discuss linear time properties like
monotony and branching time properties like robustness. The result are several
query patterns, similar to but more complex than the ones proposed by P. T.
Monteiro et al. A toolbox, called TemporalLogicTimeSeries for the automated
construction of queries from data is also presented. The second problem we
turn to concerns the two types of long-term behaviors that logical models are
capable of producing: steady states, in which the activity levels of all
network components are kept at a fixed value, and cyclic attractors in which
some components are unsteady and produce sustained oscillations. We attempt to
understand the emergence of these behaviors by searching for symbolic steady
states as defined by H. Siebert. Our main contribution is the introduction of
the prime implicant graph, which describes all minimal conditions under which
components may change their activities, and an optimization-based algorithm
for the enumeration of all maximal and minimal symbolic steady states.
Essentially, we generalize the canalizing effects and forcing structure that
were first introduced and studied by S. Kauffman and F. Fogelman in the
context of random Boolean networks. The chapter includes a theorem that
relates symbolic steady states to the existence of positive feedback circuits
in the interaction graph. A toolbox, called BoolNetFixpoints that implements
our algorithm is also described. The theme of the last chapter is how to deal
with uncertainties that inevitably appear during the modeling of biological
systems. One is often forced to resolve them since most types of analysis
require a single, fully specified model. The knowledge gap is usually filled
by making simplifications or by introducing additional assumptions that are
hard to justify and therefore somewhat arbitrary. The alternative is to work
with and analyze sets of alternative models, rather than single models. This
idea entails additional theoretical and practical challenges: With which
language should we describe our partial knowledge about a system? How can
predictions be made given that each model in the set may behave differently?
How can hypotheses and additional data be added to the current knowledge in a
systematic manner? It seems that there are in principle two different
approaches. The first one is constraint-based and studied by F. Corblin et al.
It translates the partial knowledge and modeling formalism into facts and
rules of a logic program. Common solvers can then deduce additional properties
or test the validity of given queries across all models. In contrast, we
propose to study the pros and cons of an explicit approach that enumerates all
models that agree with a given partial specification. During the first step,
models are enumerated and stored in a database. During a second step, models
are annotated with additional information that is obtained from custom
algorithms. The relationships between the annotations are then analyzed in a
third step. The chapter is based on the prototype implemention
LogicModelClassifier that performs the discussed steps. Throughout, we apply
our results to two previously published models of biological systems. The
first one is a small model of the galactose switch which regulates the
transcription of genes that are involved in the metabolism of yeast. We
address questions that arise during the construction of the model, for example
the number of involved components and their interactions, as well as issues
related to model validation and model revision with time series data. The case
study also discusses different approaches to data discretization. The second
one is a medium size model of the MAPK network studied by D. Thieffry et al.
that is used to predict the cell fate response to different stimuli involving
the growth factors EGF, TGFB, FGF and DNA damage. With the methods developed
in this thesis we can prove that the model is capable of 18 different
asymptotic behaviors, 12 of them steady states and 6 cyclic attractors. The
question of which attractor is reached from which initial state is answered
and we can show that the response in terms of proliferation or growth arrest
and apoptosis is fully determined by the input stimulus.Diese Arbeit beschäftigt sich mit drei Herausforderungen, die beim Modellieren
von regulatorischen Netzwerken und der Signaltransduktion auftreten. Zunächst
beschreiben wir den logischen Formalismus, der von R. Thomas eingeführt und D.
Thieffry, E. H. Snoussi und M. Kaufman weiterentwickelt wurde. Er zeichnet
sich dadurch aus, dass die Komponenten des Modells nur Werte aus einem
endlichen Bereich annehmen. Wir stellen die grundlegenden Objekte eines
logischen Modells, den Zustandsübergangsgraphen und den Interaktionsgraphen,
vor und besprechen das Model Checking, eine Methode zur automatischen Prüfung
von Ausdrücken temporaler Logiken in gegebenen Modellen. Der erste Teil der
Arbeit beschäftigt sich damit, wie wir entscheiden können, ob ein gegebenes
Modell mit Zeitreihendaten konsistent ist. Dazu konstruieren wir verschiedene
Anfragen nach denen Daten in temporale Logiken übersetzt werden können.
Zeitreihendaten spielen eine wichtige Rolle beim Reverse Engineering von
logischen Modellen nach Daten, aber bisher nur unter der Annahme, dass die
Übergänge des dem Modell zugrundeliegenden Übergangssystems synchron sind. Die
realistischere Annahme, nämlich dass sich die Aktivitäten der Komponenten
asynchron ändern, wurde bisher in diesem Zusammenhang nicht untersucht. Das
liegt wahrscheinlich daran, dass die dadurch entstehenden nicht-
deterministischen Übergangssysteme ein ohnehin schon schwieriges Problem noch
weiter verkomplizieren. Unser Beitrag in diesem Zusammenhang sind verschiedene
pfadbasierte Definitionen von Konsistenz, die unabhängig von der gewählten
Übergangsrelation prüfbar sind. Wir diskutieren die Möglichkeit Monotonie- und
Robustheits-Annahmen mithilfe von Linear Time Logic und Computational Tree
Logic zu kodieren. Außerdem wird die Toolbox "TemporalLogicTimeSeries" zur
automatischen Generierung der besprochenen Anfragen vorgestellt. Im zweiten
Teil wenden wir uns dem Langzeitverhalten und den Attraktoren von logischen
Modellen zu. Wir versuchen die Existenz von stabilen Zuständen, in denen die
Aktivitäten aller Komponenten konstant bleiben, und auch von zyklischen
Attraktoren, in denen einige Komponenten dauerhaft instabil sind, mithilfe der
sogenannten symbolischen Fixpunkte zu erklären. Die Ergebnisse beziehen sich
dabei auf die Definitionen von H. Siebert. Es werden die Prim-Implikanten, als
minimale Bedingungen unter denen diskrete Funktionen ihren Wert ändern können,
eingeführt und der Prim-Implikanten-Graph vorgestellt. Das zentrale Ergebnis
ist, dass symbolische Fixpunkte durch bestimmte Kantenmengen in diesem Graphen
repräsentiert werden. Diese können durch 0-1 Optimierungsprobleme beschrieben
und mithilfe von üblichen Constraint-Solvern gefunden werden. Ein Skript, das
alle beschriebenen Schritte durchführt, ist unter dem Namen "BoolNetFixpoints"
verfügbar. Im letzten Teil der Arbeit beschäftigen wir uns mit Ungewissheiten,
die während des Modellierens biologischer Systeme zwangsläufig auftreten. Oft
ist man gewzungen diese auszuräumen, da die meisten Analysemethoden
vollständig spezifizierte Modelle benötigen. Das geschieht oft dadurch, dass
starke Vereinfachungen gemacht oder schwer zu begründende, und damit
willkürliche, Annahmen getroffen werden müssen. Die Alternative dazu besteht
darin gleichzeitig mit allen Modellen zu arbeiten, die dem aktuellen Stand des
Wissens entsprechen. Dadurch entstehen zusätzliche theoretische und praktische
Herausforderungen: Mit welcher Sprache können Modelle teilweise spezifiziert
werden? Wie lassen sich Vorhersagen treffen, wenn sich jedes Modell potenziell
anders Verhalten kann? Wie können zusätzliche Annahmen und Daten möglichst
systematisch hinzugefügt werden? Im Prinzip gibt es zwei Herangehensweisen.
Der Constraint-Programming Ansatz, umgesetzt von F. Corblin et al., übersetzt
das vorhandene, partielle Modell sowie den Modell-Formalismus in Fakten und
Regeln eines logischen Programms. Übliche Logic Programming Solver können dann
prüfen ob sich eine Eigenschaft aus diesem Programm herleiten läßt, oder
nicht. Im Gegensatz dazu untersuchen wir die Vor- und Nachteile eines
expliziten Ansatzes. Dabei werden alle Modelle, die mit einer gegebenen
Spezifikation konsistent sind, aufgezählt und in einer Datenbank gespeichert.
In einem zweiten Schritt können die Modelle mit zusätzlichen Informationen
versehen werden, deren Beziehungen zueinander dann in einem dritten Schritt
ausgewertet werden. Das Kapitel orientiert sich an der prototypischen
Umsetzung "LogicModelClassifier" mit der die besprochenen Schritte ausgeführt
werden können. Die entwickelten Methoden und Ideen werden an zwei Modellen
illustriert. Das erste ist ein kleines Modell des Galaktose-Genschalters in
Hefe welcher am Stoffwechsel beteiligt ist. Es werden Fragen behandelt die
sich beim Aufstellen des Modells stellen, zum Beispiel wieviele Komponenten
gebraucht werden und wie diese interagieren sollen. Des Weiteren wird die
Modell-Validierung und Revision mit Hilfe von Expressionsdaten angesprochen.
Verschiedene Herangehensweisen zur Diskretisierung der Daten werden
miteinander verglichen. Das zweite ist ein größeres Modell des MAPK Systems,
welches das Schicksal von Krebszellen in Abhängigkeit von verschiedenen
Umwelteinflüssen beschreibt. Zu den Einflüssen zählen die Wachstumsfaktoren
EGF, TGFB und FGF sowie DNS-Schäden. Mit den in der Dissertation erarbeiteten
Methoden und Ideen können wir zeigen, dass das Model in der Lage ist 18
verschiedene Reaktionen zu zeigen. 12 davon sind stabile Zustände und 6 sind
zyklische Attraktoren. Die Frage welcher Attraktor von welchem Anfangszustand
erreicht werden kann wird beantwortet und wir können zeigen, dass das
asymptotische Verhalten des Modells, in Bezug auf die Entscheidung
Zellwachstum oder Zelltod, vollständig durch die Anfangsbedingungen bestimmt
ist
Presentation_1_Designing miRNA-Based Synthetic Cell Classifier Circuits Using Answer Set Programming.PDF
<p>Cell classifier circuits are synthetic biological circuits capable of distinguishing between different cell states depending on specific cellular markers and engendering a state-specific response. An example are classifiers for cancer cells that recognize whether a cell is healthy or diseased based on its miRNA fingerprint and trigger cell apoptosis in the latter case. Binarization of continuous miRNA expression levels allows to formalize a classifier as a Boolean function whose output codes for the cell condition. In this framework, the classifier design problem consists of finding a Boolean function capable of reproducing correct labelings of miRNA profiles. The specifications of such a function can then be used as a blueprint for constructing a corresponding circuit in the lab. To find an optimal classifier both in terms of performance and reliability, however, accuracy, design simplicity and constraints derived from availability of molcular building blocks for the classifiers all need to be taken into account. These complexities translate to computational difficulties, so currently available methods explore only part of the design space and consequently are only capable of calculating locally optimal designs. We present a computational approach for finding globally optimal classifier circuits based on binarized miRNA datasets using Answer Set Programming for efficient scanning of the entire search space. Additionally, the method is capable of computing all optimal solutions, allowing for comparison between optimal classifier designs and identification of key features. Several case studies illustrate the applicability of the approach and highlight the quality of results in comparison with a state of the art method. The method is fully implemented and a comprehensive performance analysis demonstrates its reliability and scalability.</p